D-HOTM: distributed higher order text mining
نویسنده
چکیده
We present D-HOTM, a framework for Distributed Higher Order Text Mining based on named entities extracted from textual data that are stored in distributed relational databases. Unlike existing algorithms, D-HOTM requires neither full knowledge of the global schema nor that the distribution of data be horizontal or vertical. D-HOTM discovers rules based on higher-order associations between distributed database records containing the extracted entities. A theoretical framework for reasoning about record linkage is provided to support the discovery of higher-order associations. In order to handle errors in record linkage, the traditional evaluation metrics employed in ARM are extended. The implementation of D-HOTM is based on the TMI [29] and tested on a cluster at the National Center for Supercomputing Applications (NCSA). Results on a dataset simulating an important DEA methamphetamine case demonstrate the relevance of D-HOTM in law enforcement and homeland defense.
منابع مشابه
Distributed Higher Order Text Mining
-The burgeoning amount of textual data in distributed sources combined with the obstacles involved in creating and maintaining central repositories motivates the need for effective distributed information extraction and mining techniques. Recently, as the need to mine patterns across distributed databases has grown, Distributed Association Rule Mining (D-ARM) algorithms have been developed. The...
متن کاملFrom HOTs to Self-Representing States
According to David Rosenthal, a mental state is conscious just in case its subject suitably represents herself as being in that state, where this entails that the mental state " is accompanied by a noninferential, nondispositional, assertoric thought to the effect that one is in that very state " (2002a, p. 410; see also Rosenthal, 1997, p. 742). This assertoric thought, since it is about anoth...
متن کاملCompetitive Intelligence Text Mining: Words Speak
Competitive intelligence (CI) has become one of the major subjects for researchers in recent years. The present research is aimed to achieve a part of the CI by investigating the scientific articles on this field through text mining in three interrelated steps. In the first step, a total of 1143 articles released between 1987 and 2016 were selected by searching the phrase "competitive intellige...
متن کاملA very-short-text clustering method based on distributed representation to identifying research capabilities of a Higher Education Institution
Purpose. Text documents are an important source of data for tech mining techniques. Usually text databases include document sufficiently long to apply conventional text mining techniques. However in some tech mining tasks, such as capabilities identification process, we have database with very short texts, which represent a challenge for conventional text mining techniques. The problem has to d...
متن کاملMOMEMI: Modern Methods of Data Mining
Modern data mining is used in order to classify and to discover relationships in big data sets. The papers, presented in the framework of the MOMEMI, deals with the most important fields of modern data mining: determining and use of patterns and templates, incremental reasoning, geometrical associations as well as text mining. Keywords-data mining; classification; forecast; cluster; association...
متن کامل